feat(slides): Slide.duplicate — slide CRUD Phase 2 (closes upstream #132 in fork)#34
Merged
Conversation
…e 2) Closes the Phase 2 sub-task of the slide-CRUD epic (#11) and the 12-year-old most-commented upstream feature request scanny#132 in this fork. API --- - `Slides.duplicate(slide, index=None)` — collection-level operation, returns a `Slide`. `index=None` defaults to `source_index + 1` (immediately after the source). Raises `ValueError` if `slide` is not a member, `IndexError` if `index` is out of range. - `Slide.duplicate(index=None)` — convenience alias delegating to `Slides.duplicate(self, index)`. Mirrors `Slide.delete()` → `Slides.remove(self)` from Phase 1. Design ------ The implementation walks the source slide's rels and decides per reltype whether to share the target part (image, media, slide-layout, slide-master, theme — package-level dedup) or deep-copy it (chart, OLE-object, embedded-package, notes-slide). For chart parts the chart XML is `copy.deepcopy`'d but the embedded xlsx workbook is *blob- copied* — the workbook is the chart's data and `<c:numCache>` values in the chart XML mirror it; deepcopy of XML would not capture the binary payload. After deepcopy, the duplicated `<p:sld>` still references the source's `rId`s. We walk every descendant element via `lxml.iter()` and substitute any attribute whose name is in the OOXML relationships namespace — catches `r:id`, `r:embed`, `r:link`, `r:pict`, `r:href` in one pass without enumerating local-names (the local-name approach silently misses `r:link` on linked images, which is the most common real-world failure mode of community recipes). Notes-slide handling addresses upstream gotcha scanny#961: a NotesSlidePart back-references its parent slide via `RT.SLIDE`. The duplicate's notes-slide MUST point at the new slide, not the source. The new helper `duplicate_notes_slide_for(src, new)` rewires the back-rel and is invoked AFTER the new slide part is registered in presentation rels. Comments parts (`RT.COMMENTS` / `RT.COMMENT_AUTHORS`) are dropped on duplicate by an explicit reltype filter — comments support is out of scope for Phase 2 of #11. A test documents the drop so the policy doesn't silently change. Test coverage ------------- - `tests/test_slide_duplicate.py` — 39 new tests across 7 describe classes covering the API surface (existence, signature, raises, index variants, alias delegation), the part-graph invariants (new unique partname / rId / sldId, layout-part shared with source, modification isolation, image-dedup invariant unchanged), the notes-slide path (own copy, back-ref rewired, isolation), the defensive XPath `r:*` check (no unmapped rIds), and round-trip through save/reopen. - `features/sld-duplicate.feature` + 7 new step impls — 4 acceptance scenarios for default-index, explicit-index, alias, and IndexError paths. Verification ------------ ``` $ python3 -m pytest tests/ -q 3125 passed in 4.02s $ python3 -m ruff format src tests 203 files left unchanged $ python3 -m ruff check src tests All checks passed! $ python3 -m behave features/ --no-color 990 scenarios passed, 0 failed, 0 skipped ``` UAT script `uat_slide_duplicate.py` (untracked at repo root per CLAUDE.md §6): builds a 3-slide deck with a textbox, a picture, and speaker notes; exercises both `Slides.duplicate` and `Slide.duplicate`; round-trips; prints structural summary (image_part_count = 2 before AND after — dedup invariant holds). Out of scope (deferred per epic split): - `Presentation.append_from` — Phase 3 (cross-deck rels). - `Presentation.sections` — Phase 4 (`<p14:sectionLst>`). Refs #11.
MHoroszowski
added a commit
that referenced
this pull request
May 7, 2026
* feat(slides): add slide CRUD Phase 1 — remove, move, indexed add_slide First slice of issue #11 (slide CRUD epic). Lands the in-deck CRUD operations users have been asking for since 2014; full duplicate(), cross-deck append_from(), and Sections support stay deferred to Phases 2–4 in the epic. New public API -------------- - `Slides.add_slide(slide_layout, index=None)` — `index=None` preserves existing append behavior (backwards compatible). Integer `index` inserts at that zero-based position; `index=len(self)` is a valid append, negatives raise `IndexError`. - `Slides.move(slide, new_index)` — reorder a slide in the presentation's slide sequence. Raises `ValueError` for foreign slides, `IndexError` for out-of-range `new_index`. - `Slides.remove(slide)` — drop a slide from the collection. The presentation→slide relationship is removed; the underlying part falls out on save. Image and other media parts shared with surviving slides are preserved (covered by integration test). - `Slide.delete()` — convenience alias for `prs.slides.remove(self)`. API shape mirrors the existing `SlideLayouts.remove(slide_layout)` shape rather than upstream PR scanny#1029's `remove_slide(slide_id)` — instance-keyed is more Pythonic and matches the rest of the library. Internal additions ------------------ - `CT_SlideIdList.insert_sldId_at(rId, idx)` — index-aware sldId insertion, with `idx == len` allowed. - `CT_SlideIdList.move_sldId_to(sldId, new_idx)` — reposition an existing sldId; no-op when the index already matches. - `CT_SlideIdList.remove_sldId(sldId)` — bounded remove with a parent-of check. Test coverage ------------- - 32 new unit tests in `tests/test_slide_crud.py` covering oxml helpers, the Slides CRUD methods, the Slide.delete alias, and four full-pipeline round-trip integration tests (add-at-index, remove, delete, move). Plus a shared-image-preservation round-trip and a `index=len(slides)` boundary test added per advisor review. - 5 new behave scenarios in `features/sld-crud.feature` exercising add-at-head, add-in-middle, move, remove, and Slide.delete. - New UAT script (untracked) at `uat_slide_crud.py` builds a deck exercising every new entry-point, saves, reopens, and prints a per-slide read-back so the maintainer can compare in PowerPoint / Keynote per CLAUDE.md §6. Tangential test-isolation fix ----------------------------- `tests/opc/test_package.py:DescribePartFactory` mutates `PartFactory.part_type_for[CT.PML_SLIDE]` to a class_mock and never reverts it. The mock leaked into any later test that loaded slides through `Presentation()` — caught here by the new round-trip integration tests. Restored via a request finalizer; isolated to that one test, no behavior change for the production code path. Verification ------------ ``` $ python3 -m pytest tests/ -q | tail -3 3086 passed in 3.76s $ ruff check src tests | tail -3 All checks passed! $ python3 -m behave features/ --no-color | tail -3 986 scenarios passed, 0 failed, 0 skipped 2952 steps passed, 0 failed, 0 skipped ``` Refs #11 * feat(slides): add Slide.duplicate / Slides.duplicate (slide-CRUD Phase 2) (#34) Closes the Phase 2 sub-task of the slide-CRUD epic (#11) and the 12-year-old most-commented upstream feature request scanny#132 in this fork. API --- - `Slides.duplicate(slide, index=None)` — collection-level operation, returns a `Slide`. `index=None` defaults to `source_index + 1` (immediately after the source). Raises `ValueError` if `slide` is not a member, `IndexError` if `index` is out of range. - `Slide.duplicate(index=None)` — convenience alias delegating to `Slides.duplicate(self, index)`. Mirrors `Slide.delete()` → `Slides.remove(self)` from Phase 1. Design ------ The implementation walks the source slide's rels and decides per reltype whether to share the target part (image, media, slide-layout, slide-master, theme — package-level dedup) or deep-copy it (chart, OLE-object, embedded-package, notes-slide). For chart parts the chart XML is `copy.deepcopy`'d but the embedded xlsx workbook is *blob- copied* — the workbook is the chart's data and `<c:numCache>` values in the chart XML mirror it; deepcopy of XML would not capture the binary payload. After deepcopy, the duplicated `<p:sld>` still references the source's `rId`s. We walk every descendant element via `lxml.iter()` and substitute any attribute whose name is in the OOXML relationships namespace — catches `r:id`, `r:embed`, `r:link`, `r:pict`, `r:href` in one pass without enumerating local-names (the local-name approach silently misses `r:link` on linked images, which is the most common real-world failure mode of community recipes). Notes-slide handling addresses upstream gotcha scanny#961: a NotesSlidePart back-references its parent slide via `RT.SLIDE`. The duplicate's notes-slide MUST point at the new slide, not the source. The new helper `duplicate_notes_slide_for(src, new)` rewires the back-rel and is invoked AFTER the new slide part is registered in presentation rels. Comments parts (`RT.COMMENTS` / `RT.COMMENT_AUTHORS`) are dropped on duplicate by an explicit reltype filter — comments support is out of scope for Phase 2 of #11. A test documents the drop so the policy doesn't silently change. Test coverage ------------- - `tests/test_slide_duplicate.py` — 39 new tests across 7 describe classes covering the API surface (existence, signature, raises, index variants, alias delegation), the part-graph invariants (new unique partname / rId / sldId, layout-part shared with source, modification isolation, image-dedup invariant unchanged), the notes-slide path (own copy, back-ref rewired, isolation), the defensive XPath `r:*` check (no unmapped rIds), and round-trip through save/reopen. - `features/sld-duplicate.feature` + 7 new step impls — 4 acceptance scenarios for default-index, explicit-index, alias, and IndexError paths. Verification ------------ ``` $ python3 -m pytest tests/ -q 3125 passed in 4.02s $ python3 -m ruff format src tests 203 files left unchanged $ python3 -m ruff check src tests All checks passed! $ python3 -m behave features/ --no-color 990 scenarios passed, 0 failed, 0 skipped ``` UAT script `uat_slide_duplicate.py` (untracked at repo root per CLAUDE.md §6): builds a 3-slide deck with a textbox, a picture, and speaker notes; exercises both `Slides.duplicate` and `Slide.duplicate`; round-trips; prints structural summary (image_part_count = 2 before AND after — dedup invariant holds). Out of scope (deferred per epic split): - `Presentation.append_from` — Phase 3 (cross-deck rels). - `Presentation.sections` — Phase 4 (`<p14:sectionLst>`). Refs #11. Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan> --------- Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
MHoroszowski
added a commit
that referenced
this pull request
May 8, 2026
) (#36) Phase 4 of issue #11 (slide CRUD epic). Implements PowerPoint Sections support — read, write, and round-trip the `p:extLst/p:ext{uri=521415D9-…}/p14:sectionLst` extension that PowerPoint uses to organize slides into named groups in the slide pane. The fork's slide CRUD epic is now complete: Phase 1 (#33), Phase 2 (#34), Phase 3 (#35), and Phase 4 (this PR) collectively shipped duplicate / delete / reorder / copy-between-decks / sections across the originally listed sub-features. New public API -------------- - `Presentation.sections` → `_Sections` collection. - `_Sections` is sequence-like: `__len__`, `__iter__`, `__getitem__`, `index()`. Adds: * `add_section(name, after=None) -> Section` — append, or insert immediately after an existing section when `after` is given. * `remove(section)` — drop section; cleans up the wrapping `p14:sectionLst` / `p:ext` / `p:extLst` chain when the last section goes. - `Section` exposes: * `name` (read/write str) * `id` (read-only GUID-with-braces, e.g. `{ABC-…}`) * `slides` (tuple of |Slide|, in section order) * `add_slide(slide)` — assign or move slide into this section (a slide can belong to at most one section) * `remove_slide(slide)` — drop a slide's section assignment; slide remains in the presentation. Membership invariants --------------------- Section membership references slides by their stable `p:sldId/@id` integer (NOT by `r:id` and NOT by position), so `Slides.move(...)`, indexed `add_slide(...)`, and `Slides.remove(...)` preserve assignment without any extra plumbing. Removed slides become orphan ids in the section's `p14:sldIdLst`; `Section.slides` silently skips them on read but the XML round-trips them untouched (deliberate — matching python-pptx's "preserve foreign data" doctrine). PowerPoint compatibility ------------------------ - Empty sections emit `<p14:sldIdLst/>` so all PowerPoint versions treat them consistently (some interpret an omitted `sldIdLst` as "all unsectioned slides," which is not what we mean). - Section ids generated as `{<UPPER-CASE-GUID>}` matching PowerPoint's wire shape. - Foreign `p:extLst/p:ext` siblings (`p15:*`, modification tracking, etc.) round-trip untouched — the section ext lives alongside, not in place of. Internal additions ------------------ - New `pptx/sections.py` module hosting `Section` and `_Sections` proxy classes. - `pptx/oxml/presentation.py` extended with `CT_PresentationExtensionList`, `CT_PresentationExtension`, `CT_SectionList`, `CT_Section`, `CT_SectionSlideIdList`, `CT_SectionSlideId`, plus `SECTION_LIST_EXT_URI` constant and `CT_Presentation.{section_list, get_or_add_section_list, remove_section_list}` helpers. - `pptx/oxml/ns.py` registers the `p14` prefix (`http://schemas.microsoft.com/office/powerpoint/2010/main`). - `pptx/oxml/__init__.py` registers the new element classes. - `p:extLst` added as a successor entry on the existing ZeroOrOne `sldMasterIdLst`/`sldIdLst`/`sldSz` slots so insert ordering is correct. Tangential test-infra fix ------------------------- `tests/unitutil/cxml.py` namespace-prefix grammar widened from `Word(alphas)` to `Word(alphas, alphanums)` so test fixtures can address `p14:`, `w14:`, `o15:`, etc. Local-name grammar already supported alphanums. Test coverage ------------- - 56 new unit tests in `tests/test_sections.py` covering oxml helpers, the `Section`/`_Sections` proxies, and 8 round-trip integration tests (no-sections baseline, names + membership, GUID preservation, membership-survives-move, removal pruning, orphan preservation, empty-section, unicode/XML-special name, unsectioned-on-section-remove, sibling-ext preservation). - 5 new behave scenarios in `features/sld-sections.feature` (default empty, add, slide membership, move-preserves-membership, remove cleans up extLst). - New `uat_slide_sections.py` (untracked per repo §6) builds a 6-slide / 3-section deck, demonstrates membership preservation through a slide move, and prints a structural read-back. Verification ------------ ``` $ python3 -m pytest tests/ -q | tail -3 3222 passed in 4.33s $ ruff check src tests | tail -3 All checks passed! $ python3 -m behave features/ --no-color | tail -3 999 scenarios passed, 0 failed, 0 skipped 3000 steps passed, 0 failed, 0 skipped ``` Closes #11 Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 of slide CRUD epic (#11) —
Slide.duplicate/Slides.duplicate. Closes the 12-year-old most-commented upstream feature request scanny/python-pptx#132 in this fork.API
Slides.duplicate(slide, index=None) -> Slide— collection-level operation.index=Nonedefaults tosource_index + 1(immediately after the source — duplicates are most often siblings, not tail-appends). RaisesValueErrorifslideis not a member of the collection;IndexErrorifindexis out of range.Slide.duplicate(index=None) -> Slide— convenience alias delegating toSlides.duplicate(self, index). MirrorsSlide.delete()→Slides.remove(self)from Phase 1.Design — three load-bearing decisions
1. Namespace-filtered rId substitution (not local-name enumeration)
After
copy.deepcopyof<p:sld>, the duplicate still references the source'srIds. The remap pass walks every descendant vialxml.iter()and substitutes any attribute whose name is in the OOXML relationships namespace (http://schemas.openxmlformats.org/officeDocument/2006/relationships). One pass catchesr:id,r:embed,r:link,r:pict,r:href— and any future attribute Microsoft adds to the same namespace.Filtering by local-name (the approach in the widely-cited Stack Overflow recipe) silently misses
r:linkon linked images — the most common real-world failure mode of community recipes.A defensive test (
DescribeSlideDuplicate_RIdRemap::it_resolves_every_rId_reference_in_the_duplicate_xml) asserts that every rels-namespace attribute on the duplicate resolves to a rel on the new slide part. Cheap, catches every "I missed an attribute" regression.2. Embedded xlsx is binary, not XML
ChartPart deep-copy uses
copy.deepcopyon the chart_element, but the embeddedMicrosoft_Excel_Worksheet1.xlsx(reached viaRT.PACKAGE) is itself an OPC package — opaque bytes from python-pptx's perspective. It must be blob-copied, not XML-deepcopied. The chart<c:numCache>values mirror what's in the xlsx; copy chart XML as-is, copy the xlsx blob as-is, remap the rId between them — done.The helper
_duplicate_blob_partencapsulates this for any binary part (embeddedpackage, oleObject).3. Notes-slide back-rel rewire (community gotcha)
A
NotesSlidePartback-references its parent slide viaRT.SLIDE. If you blindly carry that rel through, the duplicate's notes-slide back-references the source slide, not the duplicate — PowerPoint accepts the file but Save-Notes-As-PDF and outline-view actions misbehave. This is the root cause of upstream issue scanny/python-pptx#961.The implementation drops
RT.NOTES_SLIDEfrom the slide-rels walk and creates a freshNotesSlidePartat theSlides.duplicatelevel (helperduplicate_notes_slide_for) with the back-rel rewired to point at the new slide.What's shared vs deep-copied
RT.IMAGE,RT.MEDIA,RT.VIDEO,RT.AUDIORT.SLIDE_LAYOUT,RT.SLIDE_MASTER,RT.THEMERT.HYPERLINK(external)RT.CHARTRT.OLE_OBJECT,RT.PACKAGERT.NOTES_SLIDERT.COMMENTS,RT.COMMENT_AUTHORSOut of scope (deferred per epic split)
Presentation.append_from(other_pres, slide_indexes=None)— Phase 3 (cross-deck rels resolution)Presentation.sections(<p14:sectionLst>) — Phase 4RT.COMMENTS/RT.COMMENT_AUTHORS) — explicitly dropped by_DUP_DROP_RELTYPES_SLIDEfilter; revisit when the comments API surface landsTests
tests/test_slide_duplicate.pyacross 7 describe classes:DescribeSlides_Duplicate— API surface, signature, raises, index variantsDescribeSlide_Duplicate— convenience alias delegationDescribeSlideDuplicate_PartGraph— new partname, new rId, layout shared, modification isolation, XML equivalenceDescribeSlideDuplicate_ImageDedup— package-level dedup invariant + round-trip with pictureDescribeSlideDuplicate_NotesSlide— own copy, content carried, edits isolated, no-notes case, back-rel rewired, round-tripDescribeSlideDuplicate_RIdRemap— defensive XPath check on every rels-namespace attributeDescribeSlideDuplicate_RoundTrip— open → duplicate → save → reopen across all pathsfeatures/sld-duplicate.feature— default-index, explicit-index, alias, IndexError.UAT
uat_slide_duplicate.pyat the repo root builds a 3-slide deck with a textbox, a picture, and speaker notes; exercises bothSlides.duplicate(default + explicit index) andSlide.duplicate(alias); round-trips through save/reopen; prints the structural summary. Output showsimage_part_count = 2before AND after duplicate (dedup invariant holds),slide_part_count = 6(3 originals + 3 duplicates), notes correctly carried only to duplicates of slides that originally had them. Maintainer signoff received.Known coverage gaps (deferred)
ISC-19 (chart copy live probe) and ISC-20 (OLE-object copy live probe) are marked
[DEFERRED-VERIFY]in the working ISA — the code paths exist (_duplicate_chart_part,_duplicate_blob_part) and are exercised by the rels walk, but no Phase 2 fixture seeds a slide with a chart shape or an OLE object. Follow-up: addtests/test_slide_duplicate_chart.pywhen a fixture exercisingadd_chart(...)ships, or as part of Phase 3 / a dedicated 2.5 PR.Closes the duplicate sub-task of #11. Refs #11.